03:00
This slide deck was built in Quarto!
So far, you’ve:
::: notes One way you can work with the data that you’ve groomed and selected from BigQuery is by exporting it to use elsewhere. Depending on the size of the data and how often the data changes, this might be a great use case, and very convenient. Let’s go over that briefly.
Now, you might think that the Export button as you’re looking at a table would be a great way to get data out of GCP, but this export button is actually for exporting to other Google products, like Looker Studio or Google Cloud Storage. If that’s what you want, fantastic, but let’s assume you want to take a .csv and just run with that data somewhere else. What should you do?
To export a .csv, you need to have query results, and you can save those in various formats. So we have a couple of options.
:::
Since you’ve saved the query you used to create this table, you can re-run that query.
Or, since you’ve created a table that contains all the data you want, you run a “give me everything” query:
SELECT * FROM [table_name]
Run a query – either your saved query or a “SELECT * FROM…” on your saved table.
Then Click the Save Results button and save the .csv to your computer.
03:00
Quick reminder: R is a language, that can be run in many settings:
RStudio is a fully featured IDE that runs on Linux, Windows, and Mac. It’s much more heavy weight than just the language.
Many of us love RStudio. You can spin up an RStudio server in GCP… but you don’t have to, and it might mean spending money when you don’t need to.
You can use your normal RStudio (on your computer or in Posit.cloud) to work with BigQuery.
Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.
Please open whatever RStudio you typically use. https://posit.cloud is a good option!
When you get there, open a new Quarto or R Markdown document.
03:00
There are several ways to connect to BigQuery (see, e.g. https://solutions.posit.co/connections/db/databases/big-query/) but the way I’m going to show you uses a library called bigrquery.
Please install bigrquery as well as tidyverse, if you don’t already have that installed.
Then you’re going to start a new Quarto or R Markdown document and add a code chunk that loads these two libraries:
Now you need to authenticate so that RStudio can connect to BigQuery. You’ll next type:
bq_auth()
Use the same Google Identity you’re using for GCP. If it suggests installing httpuv, do it, as otherwise authorization might fail.
You might or might not see this – you might get this code passed back automatically with a message that says
Authentication complete. Please close this page and return to R.
The pattern is:
project_id and my_sql_query objectsresults <- bq_project_query(project_id, my_sql_query)df <- bq_table_download(results)Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.
::: note In the burger menu, select Vertex AI, then Workbench. Enable the api.
:::
Important to shut down: https://cloud.google.com/vertex-ai/docs/workbench/instances/shut-down
In a new Terminal in your Jupyter instance, enter the following:
conda create -n r
conda config --add channels conda-forge
conda install -c conda-forge r-base
conda install -c conda-forge r-essentials
conda install -c conda-forge r-tidyverse
conda install -c conda-forge r-stringr
conda install -c conda-forge r-gargle
conda install -c conda-forge r-bigrquery
conda activate r
Answer “y” when prompted.
File > New > Notebook
Select “R” as the kernel
Joy Payton, Children’s Hospital of Philadelphia